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cients of panel time series, i.e., of mnltiple time series, when each has a small nnmber 
of observations. These tests can determine the acceptance or the rejection of each 
hypothesis individnally while controlling the average type one error. Strikingly, the 
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1 Introduction 


Testing a unit root hypothesis is a very important subject since, for example, it can 
be applied to test the Purchasing Power Parity (PPP) theory in Economics. The 
topic attracts much attention in research also because the traditional unit root tests 
can have low power in some circumstances. One such circumstance relates to the 
panel time series which consists of N series to be simultaneously tested, each having 
T observations, where N is large or moderate and T is small. This is the scenario we 
consider in this paper. Although we focus on the Economics settings when discussing 
applications, our solutions are applicable to other applications involving the “small n 
and large p” problems in Statistics. 

In this paper, we consider a new approach to derive several novel tests that im¬ 
prove, in power, on the traditional t-test for testing multiple unit root null hypotheses 
and white noise null hypotheses. This new approach is based on the optimal multiple 
test criterion (Liu, 2006, Storey, 2007, Storey et al., 2007, Hwang and Liu, 2010, 
and Noma and Matsui, 2012 and 2013), which are motivated by biological microar¬ 
ray data analyses. This approach could determine acceptance or rejection of each 
hypothesis individually while controlling the average type one error of the multiple 
tests. Traditionally, the panel unit root approach tests against the null hypothesis 
that all series follow a unit root model (Levin et ah, 1992, Baltagi and Kao, 2000, 
Bai and Ng, 2004, 2010, Pesaran, 2007, Pesaran, et al., 2013, and etc.). Hence either 
all series are declared a unit root model (i.e. non-stationary series) or some declared 
stationary series without identifying which. In contrast, the tests proposed in this 
paper could determine the stationarity of each series and in the meantime control the 
average type one error. This seems more desirable. 

The multiple test criterion considered by Liu (2006), Storey (2007), Hwang and 
Liu (2010), and Noma and Matsui (2012, 2013), are to maximize the average power 
while controlling the average type one error. Interestingly, such a criterion is equiva¬ 
lent to other optimality criteria based on controlling the false discovery rate (FDR). 
See Storey (2007) and Hwang and Liu (2010). Their approaches and the approach 
in this paper all use the Neymann-Pearson fundamental lemma to derive optimum 
procedures. 

While all aim at controlling the average Frequentist type one errors, the difference 
between the approaches of Storey (2007), and the group of researchers, Hwang and 
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Liu (2010) and Noma and Matusi (2012, 2013) is that Storey’s approaches aims to 
maximize the Frequentist average powers whereas the others aim to maximize the 
Bayesian average powers. The advantage of the approaches of Hwang and Lin (2010) 
and Noma and Matusi (2012, 2013) over Storey’s approach is that the former are much 
faster in computation and also, as shown in Hwang and Liu (2010), provide higher 
average test power. While the approaches of Hwang and Lin (2010) and Noma and 
Matusi (2012, 2013) are similar, the statistics proposed by Hwang and Liu have fur¬ 
ther approximated formulae in simpler forms, which can be easily calculated without 
evaluating integral, unlike Norma and Matsni’s approach which reqnires evalnating 
3N integrals. 

In this paper, we tackle the difficult problem of testing coefficients of time series 
models. We follow the approach of Hwang and Lin (2010) which constructs the MAP 
test, i.e., the test that maximizes the Bayesian expected average power with respect 
to a prior distribution while controlling the Freqnentist average type one error. The 
general theory developed in Section 3 shows that the MAP test is an approximation of 
Story’s test. To derive the statistics for testing the one-sided and two-sided hypotheses 
of the coefficients of panel AR(1) models, we assnme a class of priors on the means, a 
class of priors on the variances, or on both resulting in a MAP statistic shrinking the 
means, the variances or both respectively. These statistics are farther approximated, 
leading to the proposed statistics. Strikingly, in all sitnations, the proposed statistics 
basically take a simple form similar to the t-statistic; the only difference is that the 
means and the variances are estimated by shrinkage estimators. Previously, such a 
result was available in Hwang and Lin (2010), Cni er al. (2005), and Smyth (2004) 
only for the usual ANOVA models and only for the procedure shrinking the variances. 
For the procednre shrinking the means and the variances, the tests of Hwang and Liu 
(2010) were not put in the form of the t-statistic. 

Onr proposed shrinkage t-tests are shown to have higher average powers than the 
traditional t-test because the tests “borrow the strength” from all series. The tests 
implicitly determine how similar the parameters of the series are. The more similar 
the series are, the more extensively the data from other cross sections are used to 
estimate the parameters of the individual series. Consequently, the improvements are 
larger. 

Note that the testing statistics developed in this paper aim to satisfy the Frequen¬ 
tist criteria of controlling the average type one error, even thongh we use a Bayesian 
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approach to construct the proposed statistics. To be more realistic, we consider not 
only a prior but a class of priors indexed by some hyper-parameters. We use the data 
to estimate the hyper-parameters; hence the procedure is called empirical Bayes, 
which is equivalent to the Frequentist approach based on a random effect model. 
Hence our results are quite different from the Bayesian unit root tests proposed in 
Uhlig (1994) and Philips and Xiao (1998). 

This paper uses a bootstrap method to obtain the critical value to control the 
average type one error of the proposed tests. Simulations in Section 7 show that 
bootstrap works well for all our settings. Note that our problem is different from 
the unit root bootstrap tests proposed by Ferretti and Romo (1996) and Park (2003), 
which aim at testing a single hypothesis with a large number, T, of observations. Sim¬ 
ulation results also show that the proposed tests have either higher or similar average 
power when compared with the t-tests in all the cases we considered. Specifically, 
when iV = 80 and T = 10, the proposed tests increase the average power of t-test by 
70% and 25%, respectively, for testing the white noise null hypotheses and the unit 
root null hypotheses. We also demonstrate similar improvement when the model is 
misspecihed and when the cross section series are dependent. In this paper, although 
we only work on AR(1) models, we anticipate that these results can be generalized 
to more complex time series models. 

Our proposed tests are fast in computation. Given a data set with N = 1000 and 
T = 10, it takes about 10 seconds to compute our proposed tests (Fss and RFss) for 
all 1000 hypotheses, using a Laptop and the GAUSS 9.0 program. 

The rest of this paper is organized as follows. In Section 2, we present the model 
considered in this research and give a review of the optimal discovery procedure 
(Storey, 2007) and the maximizing average power test (MAP) (Hwang and Liu, 2010). 
In Section 3, we develop a theory that links the two approaches. In Sections 4 and 

5, we derive the MAP test under various prior assumptions for the two-sided and the 
one-sided hypotheses. The proposed empirical Bayes tests are constructed in Section 

6, where the issues of estimating the hyper-parameters and controlling the average 
type one error by bootstrap are discussed. In Section 7, we present the simulation 
results. Section 8 gives the concluding remarks. 


4 



2 The Model and Reviews of the ODP and MAP 
Tests 


Suppose the V-dimensional AR(1) processes are generated by 

yj,t = (l>jyj,t-i + for 1 < j < iV and 1 < t < T, (1) 


where for section j, ej^t is an i.i.d. normal random variable with zero mean and 
variance cr|. Note that we allow the dependence of the cross section series in model 
O- Except in Sections 6 and 7, we derive the tests without assuming that the cross 
section series are independent throughout the paper. 

The observation of the j-th section is • • •, and the probability 

density function (pdf) of yj given i/j^i is 

(_ I _(^ 2 ) 

\/^aj 


It is easy to see that Y.J= 2 iyj,t-(l)jyj,t-if = Y.J=2{yj,t-4'jyj,t-if+i^j-(j)jf Y.J =2 y],t-i: 


where 


'^t=2 yj,tyj,t-i 

J:I=2 yl-i 


(3) 


is the least squares estimator by regressing yj^t's against Using the notation 


<j| = (1/(T -1)) - (t>jy3,t-if 

t=2 


and 




(4) 


t=2 


the pdf of Yj can be written as 

It follows that Sj) is a sufficient statistic. Also is the maximum 

likelihood estimator of (0j,cT|). 

In this paper, we consider testing simultaneously the N hypotheses 


: (j)j = (j)o vs. Hi : (j)j G D, (6) 

where 0o is a hxed number, j = 1, • • •, V, and D denotes a set of (l)j s. When D 
consists of a single point, ([6]) corresponds to a simple test. When D = {010 ^ 0o}, 
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([2]) corresponds to a two-sided test and when D = {0|0 < 0o}; dS]) corresponds to a 
one-sided test. 

The optimal discovery procedure (ODP) of Storey (2007) aims at constructing 
a rejection region that maximizes the expected number of the true positive results 
(ETP) while controlling the expected number of false positive results (EFP). The 
discussion below is applicable to the general situation, where yj, a T—dimensional 
random vector, has pdf /(yj|0j, (T'j)- The procedures we consider are to 

reject if yj e C, (7) 


where C, independent of j, is a set of T-dimensional vectors. Storey (2007) argues 
that procedures using C depending on j have no advantage in average power. Using 
/(■) to denote an indicator function, we have 


N 

ETP = € D 


Similarly, 


and yj € C)} 


( 8 ) 


EFP= ^ P<^„a,{yj 




c)= j Y. fWE<^j)dy- 

{Mj=M 


(9) 


Let Ni and iV — iVi be the number of series satisfying the alternative and the null 
hypotheses, respectively. Note that ETP/A^i and EFP/(A^ — A^i) are therefore the 
average power and the average type one error, which are quantities of great concern 
to a Frequentist. 

Applying Neymann-Pearson lemma to ([8]) and ([9]) leads to the optimal test which 
maximizes ETP (or equivalently ETP/iVi) while controlling EFP/(A^ —iVi) at level a. 
Assume that the pdf of yj is fiyj\(f)j, ctj), where 0/s are the key parameters whereas 
a/s are the nuisance parameters, which can be interpreted as variances. See, for 
example, the pdf in (|2]). The rejection region is then yj G U, where 


c^ = {y| 




( 10 ) 


where crit is a cutoff point chosen so that it has average type one error equal to a. 

There is, however, a problem with the “test” in fllUp . In order to apply fllUp . 
one needs to know which hypothesis is true and which is false, the very information 
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that one is trying to determine. When aj are all equal (to a), there is however 
a possible way out, as described in Storey (2007). Write the inequality in flTOj) as 
A/D > crit where A and D represent the numerator and the denominator of sums 
of probability density functions. Notice that the inequality in (ITOll is equivalent to 
{A + D)/D > crit + 1. Also D equals {N — Ni) ■ f{y\4>o, cr). Putting all these together 
and omitting some constants, the statistic is equivalent to J2f=i cr)/ fiy\4>o, cr), 

which can be calculated without knowing which hypothesis is true or false. 

However, when a/s are different, it is much harder to approximate the statistic 
in (ITOl) . Storey, et al. (2007) did have a successful attempt, where aj is replaced by 
an estimator based on the jth population. Storey’s procedure, however, is computa¬ 
tionally very intensive. It requires calculating N times the statistic, which involves 
summation of N terms. When N is large, it is overwhelming. 

The approach of Hwang and Liu (2010) is more parametric because of postulating 
classes of prior distributions on both /j and aj. They construct their MAP (acronym 
of maximum average power) test to maximize the Bayesian expected value of ETP, 
which is the average power with respect to the prior distribution. Using some intuitive 
approximation in the empirical Bayes fashion, they were able to derive some statistics 
which can be calculated instantaneously. In particular, their approach leads to a 
statistic, called Fss, not only borrowing the strength from all populations to estimate 

(which Storey’s procedure does), but also to estimate aj (which Storey’s procedure 
does not). Consequently, it is to be expected that Fss test has higher average power, 
which was numerically demonstrated to be so. 

In this paper, the approach of Hwang and Liu (2010) is applied to the time series 
models which are much more difficult to construct statistical tests than their ANOVA 
models. To the best of our knowledge, this paper is the first to present shrinkage 
multiple tests for the time series model. The empirical Bayes approach is equivalent 
to the random effect model approach, because the parameters are assumed to be 
random in either case. 


3 The Main Theorems 

In this section, we shall provide a general theory that shows that the ODP approach 
of Storey (2007) is asymptotically equivalent to the MAP test of Hwang and Liu 
(2010). We then apply the theory to the AR(1) model in the following sections. In 
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the case when a theorem needs a proof, the proof is provided in the Appendix. In 
this section, we consider testing the hypothesis ([6]) by assuming that each jj has the 
pdf f{yj\(f>j,o'j). Under the assumption, we consider three cases of priors: 

Case 1: (j)j is hxed and a's are i.i.d. each having the distribution Tii{a). 

Case 2: a's are hxed and (j)j are i.i.d., each having the distribution 7r2(0). 

Case 3: {(j)j,aj) are i.i.d., each having the distribution 7r(0, a). 


Considerations of these three cases shall, in order, lead to Fsv, F^m and F^s 
tests for the two-sided hypothesis and to RF^y, RF^m and RF^s for the one-sided 
hypothesis. The subscripts sv, sm, and ss, represent tests that shrink the variances, 
shrink the means and shrink both the means and variances, respectively. 

We hrst write an asymptotic formula for a rejection region C for Case 1, which 
rejects Hq if and only if yj G C. 


Theorem 1 (Case 1). Under the assumption of Case 1, as Ni and N — Ni go to 
inhnity. 


and 


where 


and 


ETP 

EFP 


N-Ni 


— BETP —> 0 in probability 
BEEP —y 0 in probability. 


( 11 ) 

( 12 ) 


BETP = — ^ / G C)dn,{a), 


BEEP = J G C')d7ri(a). 


The “B” in the notation of BETP and BEEP stands for “Bayes”. Actually, 
both quantities are also the Bayesian expected values of the Frequentist’s quanti¬ 
ties, ETP/A^i and EFP/(A^ — A^i). The MAP test of Hwang and Liu (2010) is dehned 
as the test that maximizes the Bayesian expectation of the average power, BETP, 
among all tests such that 

BEEP < a. (13) 

Hence the theorem shows that the ODP approach of Storey (2007) is asymptotically 
equivalent to the MAP test of Hwang and Liu (2010). 




The proof of the above theorem using the law of large numbers is based on the 
assumption that ex's are independent. However, even if a's are correlated, it is possible 
to write weaker conditions so that the law of large numbers applies and hence the 
theorem could be established under weaker assumptions. 

Next, by interchanging the order of integration, we can write BETP and BEFP as 

BETP=f Y, 

and 

BEFP = / f f{y\(j)o,a)d7Ti{a)dy. 

JyGC J 

Therefore the Neymann-Pearson fundamental lemma implies the following theorem. 


Theorem 2 (Case 1). Among all the procedures in ([7j), the MAP test consists of 
y such that 


PT(0i,---,0jv) 


S{j|0,gD}//(y|0i,cr)d7ri(a) 

//(y|0o,c^)c?7ri(c^) 


> crit. 


(14) 


where crit is chosen so that equality in flT^ is attained for the test ffTTj) . 


There is, however, a problem with PT in (ITT)) , which stands for “pseudo test”. 
Namely, it still depends on the unknown parameters 0's and is not really an applicable 
test. Following the principle of likelihood ratio test, we can use the statistic 


sup^i,-,0jv6dPT(0i, • • •, (15) 

which leads to a real statistical test. Note we can view (|T5ll as an approximate MAP 
test, since the 0's are replaced by the maximum likelihood estimators 0's. The 
approximation is one of the best imaginable approximations, even when the sample 
sizes are small. 

We now state the theorem which, under a condition, gives us an explicit formula 
for ([15]). 

Theorem 3 (Case 1). Assume that the maximization of /(y|0, n) with respect to 
0 G H is attained when 0 = 0 m (y) ^ D and <pM does not depend on aj. Then 
firs]) equals 

//(y|0M,tr)d7ri(a) 

I f{y\(po,(^)d7ri{a) ■ 
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For Case 2, similar to Theorems 1 and 2, we could establish results which are 
stated below while omitting the proof. 


Theorem 4 (Case 2). Under the assumption of Case 2, the statement in Theorem 
1 holds with 

1 


BETP = — y 

Ni 

{JIHq is false} 


y,,7,(y e C)dT^2{4>) 


and 


BEFP = 


N -Ni 


{j\Hl is true} 


Also the MAP test for this case rejects Hq if and only if G C and 

^U|i7g is true} •' IPO’ 


(17) 


where crit is chosen so that the equality in flT^ holds. 


The above region is not a usable rejection region, since it depends on unknown 
(t's. To derive a useful version, we could use the likelihood ratio principle by taking 
the sup of the numerator and denominator of the ratio in flTT)) . It is easy to see that 
the resultant ratio is proportional to 

//(y|0yM)d7r2(0) 

/(y 100 , do) ’ ^ ^ 


where au maximizes / /(y |0, cr)(i7r2(0) and do maximizes f /(y |0o, C’')d7r2(0), since 
dju and do do not depend on j. Now flTSl) is a real statistic, since it does not depend 
on cr's. We note here that the process of turning flTTl) into a real statistic can also 
be carried out with estimators which are different from the maximizers. Later on, it 
turns out that the close form of d^ can not be easily derived for the AR(1) model 
and so we use a different estimator d* to substitute for aj in (|T7)1 . This leads to (lT8ll 
with aM being replaced by d* i.e. 

//(y|0,djd7r2(0) 

/(y 100 , do) • ^ ^ 

We hnally came to the easiest case. Case 3. We state the following theorem and 
omit the proof, which is similarly to that of Theorem 1. 
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Theorem 5 (Case 3). Under the assumption of Case 3, the statement in Theorem 
1 holds with 


BETP = J P^,aiy e C)dTr{(j),a) = jj /(y|(/), a)d7r(0, cr)ciy 

and 

BEFP = J e C)d7Ti{a) = JJ /(yI 0o, cr)c?7ri(cr)dy. 


Consequently, the MAP test rejects if yj G C and 

„ /*eD/(yk. •#>)*(•#>.'J) j , 

^-'^1 ff(yW,MdMcr) 

where crit is chosen so that the equality in (IT^ holds. 


The likelihood ratio principle is not used here in deriving Theorem 5. 


( 20 ) 


4 The Two-sided Test 

4.1 The t-test 

To begin, we consider the two-sided tests 

: (j)j = 00 vs. Hi : (j)j ^ 0o, where j = (21) 

The well-known t-test is to reject Hq if t'j is larger than a critical value, where 

0 = (ky - (22) 

This test is asymptotically optimal in power if we consider each hypothesis sepa¬ 
rately. However, tests with larger average power can be constructed as outlined in 
Section 3. Similar to Hwang and Liu (2010), we construct Fsy, Tsm, and Fss, which 
shrink the variances (sv), the means (sm) and both the variances and the means (ss), 
respectively. 
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4.2 The test shrinking the variances: Fsv 

To shrink only the variances and not the means, we shall consider Case 1 in Section 
3. The pdf fiyj\4>j,o'j) is given in ([5]) and D = {(p : <p ^ 0o}- From (j5]), note that 
the condition in Theorem 3 is satished with cpM = 0, where 0 is dehned in (|3]) unless 
0M = 00- The latter situation occurs with zero probability and hence can be ignored. 
It may be easier for the future user to have formulas with y replaced with yj, which 
we will do. The statistic can then be used directly to determine whether to reject 
Hq. After substituting y by jj, the approximate MAP statistic flTbl) is identical to 



(23) 


Under the assumption that has a log-normal distribution with mean /ly and 


variance Xy as in Hwang and Liu (2010), we further approximate (l23il by substituting 
a by its Bayes estimator; more precisely, is substituted by i?(/n(cT^)|data). 

After estimating /iy and Xy by data in the empirical Bayes fashion, we end up with 
the rejection region 



(24) 


and shall be dehned below. We note that the critical value, crit, shall be deter¬ 
mined using a bootstrap method so its average Frequentist’s type one error is bounded 
by a. The method is applied to all the other tests considered in this paper. We do it 
this way so the proposed tests have Frequentist’s validity. 

Note that tjE is the same as tj with the exception that (j| in fl2^ is replaced by 
(j|g, which was proposed by Cui et al. (2005). To dehne (t|^, we take the logarithmic 
transformation of and apply the Bindley-James-Stein estimator to estimate ln{a‘j). 
See Bindley (1962) and James and Stein (1961). We then use the exponential Lindley- 
James-Stein estimator to estimate (t|. Let Xj = — E{ln{xT- 2 l'^ ~ 2)), where 

Xt -2 Chi-squared random variable with degree of freedom T — 2. Hence (T|g is 


the exponential Lindley-James-Stein estimator, i.e. 



(25) 


where 
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is the Lindley-James-Stein estimator and Vt -2 is the variance of /n(x|,_ 2 /(T — 2)). 

Numerical studies in Section 6 show that fl2^ has a larger average power than 
the t-test. To explain this, note that when (t| are very different from each other, 
—XY is large. Consequently, is close to Xj and hence is close to (t| 
up to a constant. Therefore (l2Tll behaves like the t-test and can not be worse. On the 
other hand, if (j| are close to each other, resulting in a small — X^, are 

close to X and are close to the geometric mean of (j|. Since ct| are likely similar 
to each other, the geometric mean should be a better estimator than (t|. Hence test 
fl2Tl) is expected to have a larger average power than the t-test, which is conhrmed 
by numerical results. 

The Lindley-James-Stein estimator can be derived nonparametrically. Hence it 
is anticipated that the test of Cui et ah (2005) is robust with respect to the miss- 
specihcation of the distribution of The conjecture is supported by the numer¬ 

ical study therein. 


4.3 The test shrinking the means: Fsm 

Now we consider the test that shrinks the means only. Case 2 in Section 3 is assumed 
and hence Theorem 4 is applicable. 

To apply (IT^ to the AR(1) model, we consider a normal prior: 

(j)j ~ iV(/i, r^) when Hi is true. (26) 


Now we shall evaluate the denominator and the numerator of (IT^ where f{-\-) is 
dehned in ([2]). The denominator of ([19]), with y being replaced by y^, equals 


SUPafiyj\(l)0,(T) = SUPo 








Qj 


na 


(27) 


where 


T 


Cr, 


Oj 




(28) 


t=2 


Direct calculation shows that maximum occurs at Plugging this into fl27|) 

shows that fE7|) equals 

1 _ . -rr-ii 

(29) 


\/27r(3'Qj 


\T—1 

y H 2 


13 






Now to calculate the numerator of flT^ . we first calculate f f(yjl(/>,a)d7r((/>) which 
can be shown after some direct calculations to equal 


(T-l)a- 


;^2 


\T-1, 


' \/^l 




\ 








S,t‘ + <t] 


(30) 


The last expression can be derived using (|5]), rewriting the second exponential term 


in (E]) as 






h and using the classical Bayesian theory which implies 


that 9 has a A^(/r, cx^ + r^) distribution if d\d ~ N{d, and 9 ~ iV(p,, r^). 

Now it seems difficult to hnd the maximum likelihood estimator of a?. Hence 


instead we use (t| (dehned in (|1])), which maximizes the bracket in fl30|) . Hence when 
di is taken to be dj, flT^ now equals the ratio of fl30|l to fl2^ . which yields 












+ d? 


(31) 


To gain some insight about how fl31|) works, we show that it can be approximated 
by a formula similar to the t-test. Using the approximation (dQj/d|)*'^“^^/^ = (1 + 

{(j)j — (poY when T is large, we express (1^ as 


\ 






+ d^ 


2<d 

re J 


^ {4>j -0o)2 - ^ 




(32) 


According to our numerical study, the more important factor in (l3^ is the ex¬ 
ponential part and not the square root factor. Omitting the square root factor and 
taking a log transformation of the reminder of fl32p lead to 


K0i - 0o)^ - 

M- 




^ 2 

.^25+ ^25 +^2/i - 0o)^ + 5 '( 5 'j, dj, /i, r) 

J -^3 

'(’o)^ + 9(S,,CTj,m,t), 


(33) 


where 

+ (1 - A)h and (34) 

r bj + cXj 

Because 5 '(.) is a term not involving (pj, it should be less relevant to the key parameter 
(pj of the testing problem. Numerical evidence also suggests that we could ignore the 
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term, which we will do. This leads to the proposed test, which has a formula similar 
to the t-test: 

= where = (0* - (35) 

(jj pj 

Note that Tgm uses the estimator 0* which shrinks (pj toward to p. In application, 
the hyper-parameters p and r are unknown. Hence in Section 5 we use the data to 
estimate them in the empirical Bayes fashion. 

The formula of fl55D works only for r > 0. Later on if r is estimated to be zero, 
fl32|) is used instead. This principle applies to all the proposed tests of this paper. 

Our numerical studies show that Fsm is a reasonable approximation of flMD even 
when the sample size is as small as T = 10. Also, the numerical results in Section 7 
indicate that Fsm has higher average power than the t-test. 


4.4 The test shrinking the means and the variances: Fss 


To produce a test shrinking both means and variances, we assume as in Case 3 of 
Section 3 where (crj,0j) follow the prior distribution 7r(cr, 0) = 7ii{a)7i2{<p), where 
7r2(.) is the normal distribution dehned in fl26|l . and 7ri(.) is the pdf of a with the 
distribution of F being dehned right after fl2^ . Applying Theorem 5 and fl20|) to 
model (j5]) and replacing y by yj yields the MAP statistic: 

f f f(yjl</>F)'i7r2Md7ri(a) 

(36) 




9^2 








d7ri{c7) 




where the numerator is derived using similar calculations leading to (IdOjl . Assume 
7ri(.) is a log normal distribution with mean pv and variance Xy as we derived Fsv- 
Then we can approximate the MAP test by substituting ct| by its Bayes estimator 
(T|g in the numerator and denominator of fl36D . and obtain an approximation of the 
MAP test. 






\ SjF + a: 


2 

JE 


JE 




(37) 


Similar to the calculations leading to (l33ll . we ignore the hrst multiple term of (|37jl 
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and take a log transformation of the reminder, yielding 


(0 - 0o) 


2 Sj 


i<pj - 


Sj-r^+^jE 


= <1^0? + 9*{Sj,aj,n,T), 


JE 


( 38 ) 


where /3jE = t- 2 s ^%‘2 and 0*^ = + (1 — yjE)9-- Also g*{.) is a term involving 

3 jE 

no Note we do not need to recalcnlate fl55D again; we simply replace (t| by a'^E 
and by ^jE in fIMl) . 

Omitting g*{.) of fl38|) yields the proposed test: 


Hs = where tjEm = (0*e - 

<^jEPjE 


(39) 


The expression of Fgg not only has a compact formnla similar to the t-test, bnt also 
enjoys nice interpretations. Compared with the t-test, Fgg nses the shrinkage variance 
estimator instead of (j|, and the shrinkage estimator 0* instead of 0j. Therefore, 
Fgg shrinks the variances as Fsy does and shrinks the means as Fsm does. Tims we 
wonld expect that Fgg shonld perform the best among all the tests. Nnmerical stndies 
in Section 7 conhrm this expectation. 

Note that we do not need to assnme a large T in deriving flTTI) whereas we need it 
to derive ([32]). Thns Fis shonld be close to the MAP test even for small T. 


5 The One-sided Test 

We consider the one-sided test: for 1 < j < A^ 

Wq : 4>j = 00 vs. Hi : (pj < 0o- (40) 

The t-test is to reject Fq if tj = (0j — is smaller than a critical valne. 

To constrnct tests having a larger average power, we derive RF^y, RFsm and RF^y 
which shrink the variances, the means and both the variances and means, respectively. 
We inclnde “R” in the names of these tests since 0o in the nnll hypothesis is on the 
right-hand side of the alternative region. 

The test shrinking the variances : RF^y 

Snppose (f)j is hxed and nnknown, and cXj follows the prior distribntion, dehned 
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right after fl23p . which was used in deriving -^sv- Theorems 1-3 can be directly 
applied to this problem. Using the same arguments leading to F^y, we end up 
with the test statistic: 

RFi, = t,E = {ij - ( 41 ) 

^jE 

The null hypothesis will be rejected if RF^y is smaller than a critical value. 


The test shrinking the means: RFsm 

Suppose (Tj is hxed and unknown and is a random variable. Since the alterna¬ 
tive region is < 0o, we postulate that prior distribution is iV(/i, r^) truncated 
to the range (—cx), 0o)- Hence its pdf is 




1 1 



when — cx) < 0 < 00 , 


(42) 


where $(.) is the cumulative function of a standard normal distribution. By 
Theorem 4, an approximate MAP test is to reject Hq if 


sup,., /-go fjyM ^|)c^7r2(0) 

sup^,J(yjl0i = 00, 


(43) 


Similar to the derivation of Tsm, a close form for the denominator can be found 
by replacing aj with its maximum point dehned in fl28|) . However, it does 
not appear that the numerator has a close form and hence we simply replace 
cr| with d-j dehned in (jl]). This leads to 


1 


-*>( 




S,t2+o-? 




-4(A-/4)U 


(44) 


where tjm is dehned in fl5S]) . By adopting the arguments in 


2 dj 


and fl55D for 

U 

^ 0 can be 


deriving Fsm, the log transformation of ^'^Re 

expressed as (l/2)f|„^. Therefore, the MAP test can be approximated, after 


ignoring the hrst two terms of 
remainder, by 


and taking the log transformation of the 


log{^{-tjm)) + 


jm 


(45) 
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Using the inequality 1 — $(a:) < y0(a:) for x > 0, fHS]) can be shown to be 
decreasing in tj^ for >0. It is obvious that fH5|) is also decreasing for 
tjm < 0. Hence fH5l) is equivalent to the proposed test which rejects when 

^^sm = = (0* - is small. (46) 

a]/3j 


The test shrinking the variances and the means: RF^s 

Under the assumption of Case 3 in Section 3, suppose (a, (j)) follows a prior 
distribution 7i{a,(j)) = 7ii{a)7i2{4>), where vri(.) is the pdf of a with the distri¬ 
bution of cr^ being defined right after (| 2 ^ and 7 r 2 (.) is the truncated normal 
distribution dehned in fl42|) . Then the MAP test is 


/ I-oo f{yj\(p, (T^)d7i2{(p)dni{a) 

Instead of integrating with respect to a, we replace aj in the numerator and 
denominator by ajE- An approximation of the MAP test is obtained, 




<f>0-R > 


O'^ rp 

JE 


$(-f 




jEm) 


JE 




(48) 


where tjEm is dehned in fl39|l . Ignoring the hrst two terms of flT 8 |) . taking the 
log transformation of the remainder, and using the leading term in fl38l) to 
substitute for the exponent yield the proposed statistic: 


fiFis = = (4* - (49) 

^jEPjE 


6 Estimating the Hyper-parameters and the Crit¬ 
ical Values 


6.1 Estimate the hyper-parameters: /i and 

The two-sided test: Normal distribution 

We follow the empirical Bayes approach and use data to estimate the hyper¬ 
parameters (/r, r^) used in Fsm and Fss- Suppose follows A^(/i,r^) with 
probability 6i, and = (po with probability 6^0 = 1 — 6 ^ 1 . Note that di can be 
interpreted as Ni/N. 
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By assuming the independence of yj for j = 1, • • •, the log likelihood function 
of (p, r^) is 


%(/(yi,---,yiv|/i,r2)) 

= Ef=i iog{f{yj\fi,T‘^)) 

= 'Ef=i logiOi f f(yjj/u, 

= C + Ef.,logle,^[^^e 


0)d7r2(0) + (1 - di)f{yj\(j)j = 0o)} 


2 SjT^ + (T 




+ (1 — 6 ^ 1)6 






(50) 


where C is a constant not depending on p and r^. One can maximize fl50|) and 
derive the maximum likelihood estimator for (p, r^, ^i). However, this involves 
the maximization of three variables. Instead, we propose an estimator which is 
easier to compute. We use the approximation r'U N{(j)j,^). Hence 

E{^j\(j)j,aj) = (j)j and = (p'j + (51) 

Therefore, 

e ( 4 ) = eiE{(j)j\fi,T) + {i-ei)(j)o = eifi + {i-ei)(j)Q 

E{$]) = e,E{ct>] + f^\^^,r) + {l-eMl + E{^^ (52) 

= E(§:) + + r^) + {I - e,)ct>l 


Let mi and m 2 denote E{4>j) and T^(0j), respectively. Solving fi and in terms 
of mi, m 2 and 9 yields 


/i 


mi - (1 - 01)00 
01 


and 


m 2 - (1 - 0l)0o 


01 



(53) 


Substitute mi and m 2 with mi = {l/N)J2f=i^j and m 2 = i^/N)Ylf=i^‘j- 
Furthermore, replace E{^) in (15^ with (1/iV) S^i /Sj, where ct| is, in turn, 
replaced with for Fss (and (t| for Fsm)- The latter substitution for (t| is 
also applied to fl50|) . Moreover, plug the resultant formula for /i and into 
fl50|l . Then the resultant pseudo likelihood function is a function of 0i only. We 
then estimate 0i by 0i which maximizes the function. Using 0i, mi and m 2 , we 
may estimate fi and based on (|53|1 . 


The one-sided test: Truncated normal distribution 

To estimate the hyper-parameter {fi,T^) of a truncated normal distribution used 
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to derive RF^m or RF^s, we adopt the empirical Bayes approach again, assum¬ 
ing that follows the truncated normal distribution with probability 6 i and 
(pj = 00 with probability 1 — ^i. Therefore, the log likelihood function of (p, 
r 2 ) is 


= C + Ef=l 





+ (1 ~ 6^1)6 




(54) 


where is identical to tjm in but using cxj to replace dj. 

Using fl5T|) and the moments of a truncated normal distribution, we have the 
following results after some calculations: 


where a = A(q:) = and (5(a) = A(a)(a -|- A(a)). Replacing E{^j) and 

E{(l)'j) by mi and m 2 in ([55]), respectively, gives us 


/i 




= { 


el 


(/i — A(a)r)^}/(1 — a6{a)). 


(56) 


Since the right-hand side of fl56l) still involves p and r^, an iteration algorithm 
is proposed to estimate (/i,r^). We use the estimator for (p, r^) in the two- 
sided case depicted above as the initial value to obtain a function of 61 only. 
Calculate 9i that maximizes the function. Now plug di and the initial value 
of (/r, r^) into the right-hand side of fl56|) to obtain a new estimator of /i and 
r^. The process is repeated to obtain a new estimator of 6 ^ 1 , p and r^. In the 
above calculation mi and m 2 are replaced by (1/A^) Ylf=i (t^j and (1/A^) Ylf=i Vji 
respectively. Also E{-^) is replaced by (1/A^) where (t| is, in turn, 

replaced by for Fss (and for Fsm)- The latter substitutions for (t| are 
also applied to (l5Tll . 


6.2 Generating the critical value by the Bootstrap method 

In order to have a good hnite sample property, we should use the bootstrap method 
to determine the critical values of the proposed tests. In what follows, we present the 
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details of the bootstrap procedure for the two-sided test. A similar procedure can be 
applied to the one-sided test. 

Let 

Cj^t = yj,t — ^jyj,t-i for 2 < t < T and 1 < j < A^. (57) 


Under the null hypothesis, we use the hypothesized value 0o to create the following 
bootstrap sample for the j-th group, t = 1, 2,, • • •, T}, where 


ylt = 4>oylt-i + elt, t = l,2,---,T, 


(58) 


and are sampled with replacement from {ej^t, 2 < t < T}. 

One can ping yjjs into the t-statistic, Fsv statistic, Fsm statistic and Fss statistic. 
For each statistic, repeat it R times and calculate the percentile (95%tile for 5% test) 
which is then used as the critical value. 

Note that in calculating the critical valnes for Ugm and Fgg, we use data to estimate 
/i and r once and from then on, fx and r are set to be identical to its estimated value. 
Hence in each bootstrap sample, y, and r are not re-estimated. This is reasonable 
since in the bootstrap sample, is taken to be 0O; the hypothesized valne. The 
bootstrap samples do not have information about 0^ and hence they should not be 
nsed to estimate the hyper-parameters of <pj. Regarding used in the two tests Fgy 
and Fgg, we do recalculate its value for each bootstrap sample, since they contain the 
information abont aj. 


7 Simulation Studies 

7.1 The white noise hypothesis: Two-sided test for 0o = 0 

This simnlation considers a special case of the two-sided test in which the nnll and the 
alternative hypotheses are, respectively, Hq-. 0^ = 0 and Hi: (pj ^ 0 for 1 < j < A^. 
The nnll hypothesis is commonly referred to as the white noise hypothesis. 

In Section 6, we estimate the hyper-parameters of the proposed tests nnder the 
assnmption that cross section series are mutnally independent; namely, the N series 
are independent. However, the following simulation stndies both the independent 
and dependent cases. In general, we assnme a mnlti-factor error structnre (Pesaran 
et ah, 2013), which inclndes both independent and dependent cases, in order to check 
the robustness of the proposed tests with respect to cross section dependence. It 
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turns out that in both the independent case and dependent case, the proposed tests 
improve uniformly over the t-test. Hence in both cases, the proposed tests apparently 
“borrow the strength” from all the populations to do better than the t-test. 

Specihcally, the data are generated using the model 

= 4>jyj,t-i + eyi for I < j < N, with (pj = 0 for j > iVi, (59) 

~ + ^^, 2 / 2 ,* + (60) 

where eyi, 1 < j < iV, 1 < f < T, are independently iV(0, cr|) distributed. 

Model fl60|) is called a multi-factor model, which reduces to the independent model 
when cyi = Cy 2 = 0. Otherwise, {ey*}, for 1 < j < A^, are dependent. For the 
dependent case studied below, Cyi and Cy 2 are generated as random samples from the 
uniform distribution over [0,1] and [0, 2] respectively. 

For t-statistic, Fsv, Fsm and F^s, we then calculate the average power ETP/A^i 
and the average type one error EFP/(iV — Ni), where ETP and EFP are dehned in 
(IH]) and ([9]). 

The parameters aj, 1 < j < N, are i.i.d samples generated from ni^a) and pj, 
1 < j < are i.i.d samples generated from 7r2(0), where tti and 7^2 will be specihed 
below. 

Normal prior distributions 

The prior distributions assumed are 

7ri(a) : ln{a]) ~ N{fay,T^) for 1 < j < N, 

772{P) : pj ~ N{pi, T^) for 1 < j < Ni. 

We now examine the average power and the average type one error of the t- 
test and our proposed tests. In each of Figures 1.1 through 1.6, the simulated 
average power and the average type one error are plotted, against p, in solid 
curve and dotted curve respectively. In the simulation, each point is based on 
averaging at least 4,000 replications. 

In Figures 1.1 through 1.5, the cross section series are mutually independent. 
For various settings of T, N and Ni, r and the coefficient of variation (CV = 
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TyjH m) specified in the headings of these hgures, the hgures basically show that 


, , all the proposed tests have uniformly higher average power than 

statement (i) : 

the t-test, 

statement (ii) : the uniformly most powerful test is Fgs test. 

( 62 ) 

And, the average power of Fss could be 70%, as shown in Figures 1.1, larger 
than that of the t-test. 


Moreover all the tests have average type one error controlled under 5% level 
with the exception of Figure 1.5, which correspond to small N and Ni. Further 
numerical study shows that the discrepancy is due to the estimation error of fi 
and r, which is larger for small N and Ni. However, even in Figure 1.5, the 
average type one errors of alternative tests are only slightly larger than 0.05. 

As for the case of cross section dependence, we adopt the multi-factor model 
fl60|l to generate the data. Under the same settings of T, N and Ni, r and 
CV as those in Figures 1.1 through 1.5, we obtain very similar graphs showing 
basically that the statements (i) and (ii) in fl6^ hold. We only report Figure 
1.6 having the setting of Figure 1.1. 

In fact, the study shows that the improvements of Fss test over the t-test are 
slightly larger in some of the dependent cases. This is intuitively reasonable 
since a procedure shrinking toward the common means or variances should be 
expected to do better when the sections are more correlated. 

Our simulation studies also conhrm the effectiveness of the estimator for the 
hyper-parameter (/i, t^) in Section 6.1. More specihcally, the average power of 
the proposed tests using the estimated (p., r^) is very similar to that of the tests 
using the true values, although the average power corresponding to the true 
values is not reported here. 

In Econometrics, it is important to focus on the alternative hypothesis which 
is close to the null hypothesis. This is especially true for the unit root test, to 
be discussed in Section 7.2. Consequently, the tests do not have large average 
power. However, the increase of the average power by 0.05 will, on the average, 
increase the detected true positives by (0.05)A^, which could be quite substantial 
when N is large. 
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Uniform prior distributions and fixed effect model 

To show the robustness of the proposed tests with respect to the miss-specihcation 
of prior distributions, we use “wrong” distributions such as the uniform distri¬ 
butions and fixed effect model to generate parameters. We consider the uniform 
distributions as 

Tifia) : Inia]) ~ f/(2 - 2\/3rv, 2 2\/3rv) for I < j < N, 

'K 2 { 4 >) '■ ~ f/(/i — 2r, /i -|- 2r) for 1 < j < W- 

We write the distribution of ln{aj) this way, so that the variance is dry and 
the mean is two; consequently CV = ry. For such a prior, we plot the average 
power using the same settings as Figures 1.1 through 1.5 for both independent 
case and multi-factor models. The resultant graphs are similar to Figures 1.1 
through 1.5. We report only Figure 2.1 (corresponding to the independent 
case) and Figure 2.2 (corresponding to the multi-factor model) both having the 
same settings as Figures 1.1. Both figures and the unreported figures basically 
confirm the two statements in fl62p . 

To study the hxed effect model, i.e. cfj being fixed, let 


4>j= fi- 2r for 1 < j < Ni/2, 

(fj = jx + 2 t for Ni/2 + 1 < j < W, 

and CTj = a for all j (CV=0). The results displayed in Figures 2.3 and 2.4 
show that the improvements obtained by the proposed tests are also robust 
with respect to this “wrong” setting. Statements in fl62l) basically hold. 


Conditional heteroscedasticity 

Below, we shall generate ej^t from a GARCH(1,1) model instead of an i.i.d Nor¬ 
mal model. The GARCH(1,1) model is commonly used in Finance and Eco¬ 
nomics to describe conditionally heteroscedastic phenomena. The GARGH(1,1) 
model used is 


Cj^t = where = 1 + O.lhe^^.p (65) 

where e*. are i.i.d. standard normal random variables and tcA is the condi- 
tional variance of ej^t- Models ([59]), fl60l) and fl6Tll are still assumed except ej^t 
follows (|65|). The results in Figures 3.1 and 3.2 show that the proposed tests 
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still improve on the t-test. In particular, Figure 3.2 assumes a model that has 
cross section dependence and conditional heteroscedasticity. Therefore the im¬ 
provements are quite robust with respect to dependence and heteroscedasticity. 
Statements in (j62|l are basically correct. 

Large dimensional series 

In what follows, we study the large dimensional series that N = 1000 and 
Ni = 500. Under the settings of Figure 1.1, Figures 4.1 and 4.2 report the 
results of independent and dependent cases, respectively. Both hgures strongly 
conhrm the two statements in fl62p and show that the improvements provided 
by the proposed tests, including F^y, F^m and Fss, over the t-test increase 
slightly when the dimension increases. 

7.2 The unit root hypothesis: One-sided test for 0o 

Now we apply all the tests to the unit root hypothesis, for testing Hq: (pj 
(pj < 1 ioT 1 < j < N. 

We generate the data using the model 

yj,t = (l)jyj,t-i + for 1 < j < iV, with = 1 for j > iVi, (66) 

where Cj/s are generated from equation (l60|) . ej/s from a normal distribution iV(0, (t|), 
aj's from a prior distribution 'n'i{a) for 1 < J < iV, and (pj from 7r2(0) for 1 < j < W, 
where tti and 7r2 are specihed below. 

We shall graph the average powers and the average type one errors of t-test, RF^y 
and RFss- However, we do not show the results of RFsm since its performance is 
very similar to (but slightly worse than) the t-test. 

Truneated normal prior distributions 

We generate parameters by the prior distributions to derive the proposed tests. 
That is, ln{aj) for 1 < j < N have N{py,Ty) distribution, and (pj, I < j < Ni, 
follow a distribution truncated to the range (—oo, 1). 

In Figures 5.1 through 5.5, we graph the simulated average power (plotted by 
solid lines), and the simulated average type one error (plotted by dotted lines) of 
the three tests under the various combinations of T, N, Ni, CV and r specihed 
in the headings. These hgures deal with the cases when the cross sections are 


= 1 

= 1 vs. F[(: 
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independent, namely model (EU]) with Cj^i = Cj ^2 = 0 for all j. These graphs 
demonstrate that the following statement (iii) holds: 

RFss and RFsv basially have uniformly higher average power 

statement (m) : 

than the t-test. 

(67) 

Hence statement (i) in fl6^ basically holds with RFsv and RFss- Regarding 
the question as to which test of the two is better, the answer is not clear. In 
principle, the test RFss should perform better since it has more to do with 
the specihcs of the priors. However, RFss is not always the winner. This may 
have to do with the fact that more hyper-parameters need to be estimated in 
constructing RFss than those in constructing RFsv- Estimation of the hyper¬ 
parameters is a difficult problem in the one-sided case because of truncation of 
the prior. This may explain why RFss is not always the winner. 

For the dependent case, we also produce results similar to Figures 5.1 to 5.5. 
However, only Figure 5.6 is reported which has the same settings as in Figure 
5.1. Figure 5.6, for the dependent multi-factor model, is very similar to Figure 
5.1 for the independent model. This demonstrates that the improvements of the 
proposed tests over the t-test are quite robust with respect to the cross section 
dependence. 

Whether one uses RFsv or RFss, these figures show that both have higher 
average power than the t-test. The average power of RFss could be about 25% 
larger than the t-test (Figures 5.1 and 5.6). The average type one error of all 
the tests are controlled under or nearly under the 5% level. 

Uniform prior distributions and fixed effect model 

Below we shall study different priors and models. In all cases, statement (iii) in 
fl^ is shown to be true. Specifically, RFsv and RFss have uniformly greater 
power than t-test and there is no clear winner between RFsv and RFss- 

To study how improvements are affected by a ’’wrong prior”, we consider (|63ll 
except that fj is truncated so fj is in (—cxo, 1) for 1 < j < W- Following 
the settings of Figures 5.1 through 5.5, we plot the average powers which show 
that the fact that the prior is the ’’wrong” prior has little effect. The resultant 
hgures are very similar. We only report Figure 6.1 (similar to Figure 5.1) and 
Figure 6.2 (similar to Figure 5.6). 
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Similar plottings were carried out for a fixed effect model where = /i — 2r 
for 1 < j < Ni/2 and (j)j = mm(0.99, /i + 2r) for Ni/2 + 1 < j < Ni. Since in 
the graphs, fi < I, the choice of (j)j above ensures that < 1, for 1 < j < iVi. 
Hence the hrst Ni hypotheses are the alternative hypotheses. The rest of the 
hypotheses are the null hypotheses, where = 1 for j > A^i. 

We report the results in Figures 6.3 and 6.4 which have the same settings as 
Figures 6.1 and 6.2, respectively. These two sets of graphs are very similar, 
conhrming statement (hi). 

Conditional heteroscedasticity 

Figures 7.1 and 7.2 graph the average powers and average type one errors when 
the data are generated by equation (64) with the GARCH(1,1) error (l65ll . and 
the parameters are generated by the truncated normal prior distributions. The 
results conhrm statement (hi) and show that the proposed tests still improve on 
the t-test even when conditional heteroscedasticity and cross section dependence 
are present. 

Large dimensional series 

Figures 8.1 and 8.2 demonstrate results when the dimensions N = 800 and 
Ni = 600. The hgures show that statement (iii) still holds when dimension is 
large. 


8 Concluding Remarks 

To analyze the coefficients of a panel AR(1) model, we propose tests which deter¬ 
mine which individual hypothesis should be accepted or rejected. Furthermore, our 
proposed tests improve on the t-test under the criterion of average power. We derive 
them using empirical Bayes approach and then using approximation to obtain our 
proposed tests, which have a form similar to the t-test. The only difference is that, in 
our proposed tests, the estimators of the means and variances are replaced by shrink¬ 
age estimators. The proposed tests “borrow the strength” from all the series to test 
against every individual series, resulting in more power. Simulation studies show that 
the proposed tests have signihcant improvements over the t-test, especially when the 
sample size T is small and the dimension N is moderate or large. Compared to the 
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t-test, the average power of Fss and RF^s could be 70% higher in the two-sided test, 
and 25% higher in the one-sided test respectively. 

In this paper, we derive the tests under the assumption that the series are indepen¬ 
dent; and show that “borrowing the strength” from independent series will improve 
average power of the t-test. However, simulation demonstrates that the improvement 
is robust with respect to the cross section dependence. This is only reasonable. A 
procedure that can do well by “borrowing the strength” even from independent series 
can certainly do so from dependent series. In this paper, we only work with AR(1) 
model; we, however, anticipate that these results can be generalized to the other more 
complex time series models. Since the proposed tests can determine acceptance or 
rejection of an individual hypothesis, this should prove to be a very useful method in 
practice. 

Appendix 

Proof of Theorem 1: The difference in flTTll equals 


1 {Mj&D} 


( 68 ) 


wheiegjiaj) = e C) = fiy\4>j,(^j)dy and E{gj{aj)) = J gj{a)d7ii{a). 

Since variance of gjicTj) < E{g‘j{aj)) < 1, the variance of (l68|) is bounded above 
by pi/pi = 1/pi, which converges to zero. Hence (1681) converges in probability 
to zero by the law of large numbers, completing the proof of ([TT|) . 

Equation (IT^ can be proved similar, except noting that / fy^(j fiy\<po, crj)dyd7ii{aj) 
does not depend on j. 

Proof of Theorem 3: It sufficient to show that 



(69) 


Note that the left-hand side is obviously bounded by the right-hand side since 


/(y|0j,(T) < /(yl^M,^) for (f)j G D. Also replacing 0^ by (pM on the left-hand 


side leads to a lower bound, which is exactly the right-hand side, establishing 
fl69|l and the theorem. 
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